142 research outputs found

    Concept explainability for plant diseases classification

    Full text link
    Plant diseases remain a considerable threat to food security and agricultural sustainability. Rapid and early identification of these diseases has become a significant concern motivating several studies to rely on the increasing global digitalization and the recent advances in computer vision based on deep learning. In fact, plant disease classification based on deep convolutional neural networks has shown impressive performance. However, these methods have yet to be adopted globally due to concerns regarding their robustness, transparency, and the lack of explainability compared with their human experts counterparts. Methods such as saliency-based approaches associating the network output to perturbations of the input pixels have been proposed to give insights into these algorithms. Still, they are not easily comprehensible and not intuitive for human users and are threatened by bias. In this work, we deploy a method called Testing with Concept Activation Vectors (TCAV) that shifts the focus from pixels to user-defined concepts. To the best of our knowledge, our paper is the first to employ this method in the field of plant disease classification. Important concepts such as color, texture and disease related concepts were analyzed. The results suggest that concept-based explanation methods can significantly benefit automated plant disease identification.Comment: Accepted at VISAPP 202

    Reproducible Domain-Specific Knowledge Graphs in the Life Sciences: a Systematic Literature Review

    Full text link
    Knowledge graphs (KGs) are widely used for representing and organizing structured knowledge in diverse domains. However, the creation and upkeep of KGs pose substantial challenges. Developing a KG demands extensive expertise in data modeling, ontology design, and data curation. Furthermore, KGs are dynamic, requiring continuous updates and quality control to ensure accuracy and relevance. These intricacies contribute to the considerable effort required for their development and maintenance. One critical dimension of KGs that warrants attention is reproducibility. The ability to replicate and validate KGs is fundamental for ensuring the trustworthiness and sustainability of the knowledge they represent. Reproducible KGs not only support open science by allowing others to build upon existing knowledge but also enhance transparency and reliability in disseminating information. Despite the growing number of domain-specific KGs, a comprehensive analysis concerning their reproducibility has been lacking. This paper addresses this gap by offering a general overview of domain-specific KGs and comparing them based on various reproducibility criteria. Our study over 19 different domains shows only eight out of 250 domain-specific KGs (3.2%) provide publicly available source code. Among these, only one system could successfully pass our reproducibility assessment (14.3%). These findings highlight the challenges and gaps in achieving reproducibility across domain-specific KGs. Our finding that only 0.4% of published domain-specific KGs are reproducible shows a clear need for further research and a shift in cultural practices

    Mobile Datenbanken - heute, morgen und in 20 Jahren. Tagungsband zum 8. Workshop des GI-Arbeitskreises "Mobile Datenbanken und Informationssysteme" am 28.2.2005 im Rahmen der BTW 2005 in Karlsruhe

    Get PDF
    Der Workshop Mobile Datenbanken heute, morgen und in 20 Jahren ist der nunmehr achte Workshop des GI Arbeitskreises Mobile Datenbanken und Informationssysteme. Der Workshop findet im Rahmen der BTW 2005, der GI Fachtagung für Datenbanksysteme in Business, Technologie und Web, vom 28. Februar bis zum 01. März 2005 in Karlsruhe statt. Das Workshopprogramm umfasst zwei eingeladene Vorträge sowie sieben wissenschaftliche Beiträge, die vom Programmkomitee aus den Einreichungen ausgewählt wurden. Für den zweiten Workshoptag, der im Zeichen intensiver Diskussionen stehen soll, wurden zwei weitere Einreichungen als Diskussionsgrundlage ausgewählt. Inhaltlich spannt der Workshop einen weiten Bogen: Von fast schon klassischen Fragen aus dem Kernbereich mobiler Datenbanken, wie etwa der Transaktionsbearbeitung in diesen Systemen, bis hin zu neuen Multimediaanwendungen auf mobilen Geräten und von der Anfragebearbeitung in Ad-hoc-Netzen bis zur Analyse des Stands der Technik beim Entwurf mobiler Anwendungen. Diese Breite spiegelt die Breite der Fragestellungen, die bei der Betrachtung von mobiler Informationsnutzung zu Tage treten, wider. Wir hoffen mit unserem Workshop einen Beitrag zum besseren Verständnis dieser Fragestellungen zu liefern und ein Forum zum Austausch von Fragen, Lösungsansätzen und Problemstellungen zwischen Praktikern und Forschern aus dem universitären Umfeld zu bieten

    Provenance-based Semantic Approach for the Reproducibility of Scientific Experiments

    Get PDF
    Data provenance has become an integral part of the natural sciences where data flow through several complex steps of processing and analysis to generate intermediate and final results. To reproduce scientific experiments, scientists need to understand how the steps were performed in order to check the validity of the results. The scientific experiments consist of activities in the real world (e.g., wet lab or field work) and activities in cyberspace. Many scientists now write scripts as part of their field research for different tasks including data analysis, statistical modeling, numerical simulation, computation and visualization of results. Reproducibility of the computational and non-computational parts are important steps towards reproducibility of the experiments as a whole. In order to reproduce results or to detect which error occurred in the output, it is required to know which input data was responsible for the output, the steps involved in generating them, the devices and the materials used, the settings of the devices used, the dependencies, the agents involved and the execution environment etc. The aim of our work is to semantically describe the provenance of the complete execution of a scientific experiment in a structured form using linked data without worrying about any underlying technologies. In our work, we propose an approach to ensure this reproducibility by collecting the provenance data of the experiment and using the REPRODUCE-ME ontology extended from the existing W3C vocabularies to describe the steps and sequence of steps performed in an experiment. The ontology is developed to describe a scientific experiment along with its steps, input and output variables and their relationship with each other. The semantic layer on top of the captured provenance provided with ontology-based data access allows the scientists to understand and visualize the complete path taken in a computational experiment along with its execution environment. We also provide a provenance-based semantic approach which captures the data from interactive notebooks in a multi-user environment provided by JupyterHub and semantically describe the data using the REPRODUCE-ME ontology

    Data lifecycle is not a cycle, but a plane!

    Get PDF
    Most of the data-intensive scientific domains, e.g., life-, natural-, and geo-sciences have come up with data life cycles. These cycles feature, in various ways, a set of core data-centric steps, e.g., planning, collecting, describing, integrating, analyzing, and publishing. Although they differ in the steps they identify and the execution order, they collectively suffer from a collection of short-comings. They mainly promote a waterfall-like model of sequentially executing the lifecycles’ steps. For example, the lifecycle used by DataOne suggests that “analyze” happens after "integrate". However, in practice, a scientist may need to analyze data without performing the integration. In general, scientists may not need to accomplish all the steps. Also, in many cases, they simply jump from, e.g., "collect" to "analyze" in order to evaluate the feasibility and fitness of the data and then return to "describe" and "preserve" steps. This causes the cycle to gradually turn into a mesh. Indeed, this problem has been recognized and dealt with by the GFBio and USGS data lifecycles. The former has added a set of direct links between non-neighboring steps to allow shortcuts, while the later has factored out cross-cutting steps, e.g., "describe" and "manage quality" and argued that these tasks must be performed continually across all stages of the lifecycle. Although aforementioned lifecycles have realized these issues, they do not offer customization guidelines based on, e.g., project requirements, resources availability, priority, or effort estimations. In this work, we propose a two-dimensional Cartesian-like plane, in that the x- and y-axes represent phases and disciplines, respectively. A phase is a stage of the project with a predefined focus that that leads the work towards achieving a set of targeted objectives in a specific timespan. We identify four phases; conception, implementation, publishing, and preservation. Phases can be repeated in a run, and do not need to have equal timespan. However, each phase should satisfy its exit criteria to be able to proceed to the next phase. A discipline, on the vertical axis, is a set of correlated activities that, when performed, makes a measurable progress in the data-centric project. We have incorporated these disciplines: plan, acquire, assure, describe, preserve, discover, integrate, analyze, maintain, and execute. An execution plan is developed by placing required activities in their respective disciplines’ lanes on the plane. Each task (activity instance) is visualized as a rectangle that its width and height respectively indicate the duration and effort estimation needed to complete it. The phases, as well as the characteristics of the project (requirements, size, team, time, and budget), may influence these dimensions. It is possible for a discipline or an activity to be utilized several times in different phases. For example, a planning activity gains more weight in conception and fades out over the course of the project, while analysis activities start in mid-conception, get full focus on implementation, and may still need some attention during publishing phases. Also, multiple activities of different disciplines can run in parallel. However, each task's objective should remain aligned according to the phase’s focus and exit criteria. For instance, an analysis task in the conception phase may utilize multiple methodologies to perform experimentation on a small sample of a designated dataset, while the same task in the implementation phase conducts a full-fledged analysis using the chosen methodology on the whole datase

    Engineering incentive schemes for ad hoc networks: a case study for the lanes overlay [online]

    Get PDF
    In ad hoc networks, devices have to cooperate in order to compensate for the absence of infrastructure. Yet, autonomous devices tend to abstain from cooperation in order to save their own resources. Incentive schemes have been proposed as a means of fostering cooperation under these circumstances. In order to work effectively, incentive schemes need to be carefully tailored to the characteristics of the cooperation protocol they should support. This is a complex and demanding task. However, up to now, engineers are given virtually no help in designing an incentive scheme. Even worse, there exists no systematic investigation into which characteristics should be taken into account and what they imply. Therefore, in this paper, we propose a systematic approach for the engineering of incentive schemes. The suggested procedure comprises the analysis and adjustment of the cooperation protocol, the choice of appropriate incentives for cooperation, and guidelines for the evaluation of the incentive scheme. Finally, we show how the proposed procedure is successfully applied to a service discovery overlay

    Navigating the long tail - Towards practical guidance for researchers on how to select a repository for long tail data

    Get PDF
    With nearly 2000 entries in the Registry of Research Data Repositories (re3data.org, November 2017) researchers are confronted with a plethora of repositories to deposit research data. Given the diversity of these services, we have noticed that researchers find it challenging to make an informed decision, especially when they are dealing with data from the so-called “long tail” (small, diverse, individual, less standardized data). Although, re3data.org provides a very comprehensive list of criteria (i.e. filters) to narrow down the number of choices, there is still advice needed, for example, on evaluating the importance of a criteria (e.g. type of repository) or the impact of a certain choice (e.g. which PID?). In this poster presentation, we take the perspective of the research data management helpdesk, a central service facility at the Friedrich-Schiller University in Jena (Germany), and investigate how we could address this selection challenge. The aim is to develop a practical guide for researchers from domains where there is no obvious choice or well-established repository available (i.e. the long tail) and where researchers rely on general-purpose repositories. In a first step, we compared five generic repositories for long tail data (Figshare, Zenodo, Dryad, RADAR, Digital Library Thuringia) using the individual descriptions and properties on re3data.org. For some criteria, the information content in re3data.org was rather limited, so we also explore the individual websites of the repository providers. For example, the criteria “Quality Management” only says whether a repository provider does quality management, but not what exactly that means. Another example for rather sparse information is the level of data curation available and applied to the data in a certain repository. Such information would be helpful in the evaluation process. In a second step, we took a number of real cases from our work at the helpdesk and investigated the matching between the researcher’s intentions and expectations with the means and information available to evaluate a repository (both, at re3data.org, repository website). This might be straightforward, for example, if the intention is to make data citable, where one needs to check whether a PID is provided. But it might more difficult, for example, if a researcher would like to assess the visibility a dataset may gain from publishing with a certain repository. In this case, one should look at a number of properties (e.g. Metrics, Sydications, API, Licences) with rather technical information
    • …
    corecore